Necessary Packages:
This project explores the gender pay divide among Level 8 graduates in the fields of Natural Sciences, Mathematics, and Statistics in Ireland. Focusing specifically on the salary growth over the course of a career, we analyze how pay disparities manifest at three key stages: the beginning of graduates’ careers, across-career progression, closing with a spotlight on the final years captured within the data. using this CSO data, we examine trends and gaps in salary between male and female graduates. Our analysis uses data visualization techniques learned during ST302 to provide a clearer understanding of one of the stories hidden within the given dataset.
Initially we are given data with data with 12,000 rows(/observations)
and 7 columns(/variables).
These rows are as follows:
Since we know all of our data relates to “Salary in Natural Sciences, Mathematics and Statistics for level 8 graduates”, we can remove the columns relating to NFQ Level and field of study. These just hold the same information and so are redundant. They only making our dataset larger, so that it’s takes more memory and is more computationally expensive to work with. Hence after loading in the dataset we remove these columns.
After this, we also observed that there is only one observation where gender == “Other gender”. Since we cannot make any meaningful analysis off of one data point, this is another opportunity for us to save memory and reduce our artificially large dataset.
Finally, there is a huge amount of NA’s built into this dataset. The structure leaves spaces for salaries of individuals 5, 6, 7, and so on ‘years from graduation’, when that data did not exsist at the time of the collection. As a result of this na.omit will give us a much clearer idea of the ‘true’ dimensions of our data.
We thought it might also be handy to present the data in a wider format, this is much easier to read - instead of “statistic” and “measure of statistic”, we are presented with very clear 25th/50th and 75th percentile earnings for each combination of year/years since graduation and gender combination. +Plus consistent shorter column names makes for easier usage.
There is also adjustments that may need to be made for individual plots but we felt that these adjustments would make the data more suitable for all group members to work with.
Universal adjustments:
load("gradMS.Rdata")
gradMS <- gradMS[,-c(4,5)] # Removes 2 columns / 24,000 cells
gradMS <- gradMS %>%
filter(Gender != "Other gender") # this removes 300 rows / 1,500 cells
gradMS <- gradMS %>%
na.omit() # this removes 405 rows / 2,025 cells
gradMS_wider <- gradMS |> # we use a base pipe here because magrittr pipes play weird with '.'
pivot_wider(names_from = Statistic, values_from = .) |> # resulting table is 165 rows x 6 columns
rename(
year = Graduation.Year,
gender = Gender,
yearsGrad = Years.Since.Graduation,
p25 = `P25 Earnings of Graduates`,
p50 = `P50 Earnings of Graduates`,
p75 = `P75 Earnings of Graduates`
) |> # consistent informative names.
mutate(year = as.factor(year),
yearsGrad = as.factor(yearsGrad)) # more appropriate encodingEach contributor has given their interpretation follow their graph.
Here we’d like to take the opportunity to acknowledge visualisation
decision we’ve made and applied to all of our plots. Also - we have used
colours from the ‘paired’ pallete from color brewer, these colors are
distinguishable if the images are desaturated and colourblind safe for
Deuteranope, Protanope and Tritanope colourblindness.
This was verified @ http://hclwizard.org
pink and blue adjacent colors are used to differentiate between male and
female as they’re intuitive, it’s easier for a user to recognize as
opposed to rationalize.
We’ve also applied a minimal theme across all plots as if reduces visual
noise - reducing the cognitive load on viewers and focusing on attention
on what matters.
Question 1 specific adjustments:
gradMS_Q1 <- gradMS_wider %>%
filter(gender != "All genders") %>% # We're only interested in the difference
filter(yearsGrad == 1) %>% # We're only interested in newly grads
select(!yearsGrad) %>% # taking 'all but the yearsGrad' column
pivot_wider(names_from = gender, values_from = c(p25, p50, p75)) %>%
mutate(
payDifLower = (p25_Male / p25_Female) - 1,
payDifMedian = (p50_Male / p50_Female) - 1,
payDifUpper = (p75_Male / p75_Female) - 1
) # Use of a ratio allows for comparison independent of scale.
gradMS_Q1_final <- gradMS_Q1 %>%
pivot_longer(
cols = c(payDifLower, payDifMedian, payDifUpper),
names_to = "quartile",
values_to = "payRatio"
) %>% # Gives us a row for each year's lower/middle/upper quartiles
mutate(quartile = factor(quartile,
levels = c("payDifUpper", "payDifMedian", "payDifLower"),
labels = c("upper quartile", "median", "lower quartile"))) %>%
select(year, quartile, payRatio) # Only saving the information necessary for the plot
gradMS_Q1_final <- gradMS_Q1_final %>%
mutate(tooltip_text = paste("Ratio:", round(payRatio, 2))) # Added for interactivityQuestion 1 plot:
p1 <- gradMS_Q1_final %>%
ggplot(aes(x = year, y = payRatio, fill = payRatio < 0)) +
geom_bar_interactive(
stat = "identity",
aes(tooltip = tooltip_text)
) +
facet_grid(quartile ~ .,
labeller = labeller(quartile = c(
"payDifLower" = "Lower",
"payDifMedian" = "Median",
"payDifUpper" = "Upper"
)),
switch = "y") +
scale_fill_manual(values = c("#1F78B4FF", "#FB9A99FF"), # Male, Female
name = "Bias",
labels = c("Male", "Female")) +
labs(title = "New grad pay diff @ 25th/50th/75th percentile earnings",
subtitle = "For new maths/stats graduates",
y = "",
x = "Graduation Year") +
theme_minimal() +
theme(axis.text.y = element_blank(),
strip.text.y.left = element_text(angle = 0, hjust = 1))
# removes measures on y-axis, reducing visual noise
girafe(ggobj = p1) |>
girafe_options(opts_sizing(rescale = TRUE, width = .75))Analysis of gender pay gap of newly-graduated maths/stats students across different percentiles.
This plot illustrates the gender pay gap for recent mathematics and statistics graduates at the 25th, 50th, and 75th earnings percentiles, measured one year after graduation.
Overall, a consistent bias toward male graduates is evident across most years and percentiles, particularly at the 75th percentile, where men consistently earn more than women in the early and later years. The median (50th percentile) also reflects a male pay advantage in most cohorts.
A notable exception occurs around 2015, where the trend temporarily reverses:
At the 25th percentile, there is a pronounced female advantage, with women earning more than men in two consecutive years.
This reversal is present to a lesser extent at the median, and absent at the 75th percentile, suggesting the pay gap reversal was concentrated among lower-earning graduates.
By 2018, the male advantage reasserts itself across all percentiles, particularly at the lower end of the payscale.
From this graphic we may gather that there have been short-term
changes in salary practices.
The persistent bias at the top end informs us to the structural dynamics
disproportionately benefiting men in higher-paying positions.
Question 2 plot:
gradMS %>%
filter(Gender %in% c("Male", "Female")) %>% # Keep only Male and Female
ggplot(aes(x = cut(Years.Since.Graduation, # Divide years since grad into subdivisions with cut function
breaks = seq(0, 10, by = 2),
labels = c("1–2", "3–4", "5–6", "7–8", "9–10"),
include.lowest = TRUE),
y = .,
fill = Gender)) +
geom_boxplot(position = position_dodge(width = 0.75)) + # side by side boxplots
scale_fill_manual(values = c("Male" = "#1F78B4FF", "Female" = "#FB9A99FF"))+
theme_minimal() +
labs(title = "Earnings by gender and Graduation Year (Grouped Every 2 Years)",
x = "Years Since Graduation (Grouped)",
y = "Earnings") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Tracking graduate earnings progression over time by gender
This plot tracks and displays the difference in earnings for graduates of the mathematical and statistical sciences over a 10 year period. The Years Since Graduation variable from the data set has been modified by grouping every two years together, which is displayed across the x-axis of the plot. The variable Earnings is displayed along the y-axis. The code has been filtered to include only the Male and Female variables in the gender variable.
Upon initially viewing the plot, it is clear to see an upward trend in earnings over time across both Male and Female genders. As displayed on each box plot, the medians show an increase across each grouped year. For example, the minimum earnings for Females after 1-2 years is 275.00, while after years 9-10 the minimum is 740.00. A significant increase. The interquartile range (spread of earnings) is smaller for newer graduates, however there is a notable increase in the IQR for the later years. Potential reasons for this increase include better access to more senior high paying roles that require more experience.
The next most obvious observation from the plot is the difference in earnings between the Male and Female genders. In the initial years since graduation (year 1-2), the difference in earnings is minimal and not significant. However, as the years progress there is a clear difference in the median earnings between the two sexes.
Question 3 specific adjustments:
gradMS_long <- gradMS_wider %>%
filter(gender %in% c("Male", "Female")) %>%
pivot_longer(cols = c(p25, p50, p75),
names_to = "percentile",
values_to = "earnings")Question 3 plot:
q3_plot <- gradMS_long %>%
ggplot(aes(x = yearsGrad, y = earnings, color = gender)) +
geom_point(alpha = 0.4) +
geom_smooth(aes(group = gender), method = "lm", se = FALSE) +
facet_wrap(~percentile, scales = "free_y", labeller = as_labeller(c(
p25 = "25th Percentile", p50 = "Median", p75 = "75th Percentile"
))) +
scale_color_manual(values = c("Male" = "#1F78B4", "Female" = "#FB9A99")) +
labs(
title = "Earnings Growth Over Time by Gender and Percentile",
subtitle = "Linear trends across 25th, 50th, and 75th percentiles",
x = "Years Since Graduation",
y = "Earnings (€)",
color = "Gender"
) +
theme_minimal() +
theme(legend.position = "bottom")
ggplotly(q3_plot)Examining comparative growth rates of earnings between genders across all pay levels
This graph illustrates the growth rate of earnings between male and female graduates across the 25th, 50th, and 75th percentiles over a ten-year period post-graduation. I created this visualization to analyse the question, is the growth rate of earnings the same for university graduates of differing genders? What we see is a consistent pattern: male graduates tend to experience steeper income growth than their female counterparts, regardless of whether they start in the lower, middle, or upper end of the earnings distribution. At every level, 0 years after graduation, Males and Females seem to have the same earnings, as time progresses Males earn more
For the P25 Level the growth rate is faster for males but relatively not by much
For the P50 level the growth rate is steeper than the P25 level
For the P75 level the growth rate is much steeper than the other P-levels, and every year after 2.5 years we can see a gap between the genders.
So we can see not only are males earning more, their earnings are growing faster at every pay level
Question 4 specific adjustments:
pay_gap_later <- gradMS_wider %>%
filter(gender != "All genders") %>%
pivot_longer(cols = c(p25, p50, p75),
names_to = "percentile",
values_to = "earnings") %>%
pivot_wider(names_from = gender, values_from = earnings) %>%
mutate(
genderGap = (Male - Female) / Male * 100,
percentile = factor(percentile, levels = c("p25", "p50", "p75"),
labels = c("25th", "50th", "75th")),
yearsGrad = as.numeric(as.character(yearsGrad)) # for filtering
) %>%
filter(yearsGrad >= 5 & yearsGrad <= 10)Question 4 plot:
p4 <- ggplot(pay_gap_later, aes(x = yearsGrad, y = genderGap, colour = "#FB9A99FF")) +
geom_jitter(colour = "#FB9A99FF", alpha = 0.8) +
geom_smooth(method = "lm", se = FALSE, colour = "black") +
labs(title = "Gender Pay Gap as a percentage ratio 5-10 years post graduation",
subtitle = " percentage difference in earnings" ,
x = "Years since graduation",
y = "Gender Pay Gap (%)") +
#mean line
stat_summary(fun = mean, geom = "line", aes(group = 1), colour = "#1F78B4FF", linewidth = 1.2) +
#adjusting ranges
scale_x_continuous(breaks = seq(5, 10, 1)) +
theme_minimal()
ggplotly(p4)This plot illustrates the gender pay gap comparing males and females who have graduated 5 to 10 years ago, providing a valuable insight into the pay gap later on in the career of graduates. From producing and analysing this plot it can be seen that the gender pay gap increases with males earning more than females as careers progress, ranging from a <1 percent difference to a >15 percent difference. This gradual increase suggests that differences in pay may not only be consistent but also grow with time , showcasing gender based pay inequalities extending past entry level roles.
From the above analysis it is clear that there a pronounced pay gap between Males and Females, favoring Males across all percentiles.
As evidenced in question 1 (Owen), the pay gap is prevalent in the immediate years following graduation, particularly at the 75th percentile. There is a brief period where this trend reverses, however it is short lived and the Male advantage reestablishes itself. Question 3 (Warren) displays this trend by plotting the growth rate across both genders, revealing that Males not only earn more than there Female counterparts, but that their rate of earnings increase faster over time. Plot 4 (Oisin) reveals the pay gap in later years since graduation. This fact if further shown in Plot 2 (David) where there is a clear difference in median earnings from years 3 to 10, with earnings bias towards Males. From this plot it is also clear that earnings substantially increase over the 10 year period for both genders.